首页> 外文OA文献 >Parallel Algorithms for Bayesian Networks Structure Learning with Applications in Systems Biology
【2h】

Parallel Algorithms for Bayesian Networks Structure Learning with Applications in Systems Biology

机译:贝叶斯网络结构学习的并行算法及其在系统生物学中的应用

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The expression levels of thousands to tens of thousands of genes in a living cell are controlled by internal and external cues which act in a combinatorial manner that can be modeled as a network. High-throughput technologies, such as DNA-microarrays and next generation sequencing, allow for the measurement of gene expression levels on a whole-genome scale. In recent years, a wealth of microarray data probing gene expression under various biological conditions has been accumulated in public repositories, which facilitates uncovering the underlying transcriptional networks (gene networks). Due to the high data dimensionality and inherent complexity of gene interactions, this task inevitably requires automated computational approaches.Various models have been proposed for learning gene networks, with Bayesian networks (BNs) showing promise for the task. However, BN structure learning is an NP-hard problem and both exact and heuristic methods are computationally intensive with limited ability to produce large networks. To address these issues, we developed a set of parallel algorithms. First, we present a communication efficient parallel algorithm for exact BN structure learning, which is work-optimal provided that 2^n \u3e p.log(p), where n is the total number of variables, and p is the number of processors. This algorithm has space complexity within 1.41 of the optimal. Our empirical results demonstrate near perfect scaling on up to 2,048 processors. We further extend this work to the case of bounded node in-degree, where a limit d on the number of parents per variable is imposed. We characterize the algorithm\u27s run-time behavior as a function of d, establishing the range [n/3 - log(mn), ceil(n/2)) of values for d where it affects performance. Consequently, two plateaus regions are identified: for d \u3c n/3 - log(mn), where the run-time complexity remains the same as for d=1, and for d \u3e= ceil(n/2), where the run-time complexity remains the same as for d=n-1. Finally, we present a parallel heuristic approach for large-scale BN learning. This approach aims to combine the precision of exact learning with the scalability of heuristic methods. Our empirical results demonstrate good scaling on various high performance platforms. The quality of the learned networks for both exact and heuristic methods are evaluated using synthetically generated expression data. The biological relevance of the networks learned by the exact algorithm is assessed by applying it to the carotenoid biosynthesis pathway in Arabidopsis thaliana.
机译:活细胞中成千上万个基因的表达水平受内部和外部线索控制,这些线索以组合方式起作用,可以建模为网络。高通量技术(例如DNA微阵列和下一代测序)可用于测量全基因组规模的基因表达水平。近年来,在公共存储库中积累了许多在各种生物学条件下探测基因表达的微阵列数据,这有助于揭示潜在的转录网络(基因网络)。由于高数据维度和基因相互作用的内在复杂性,该任务不可避免地需要自动化的计算方法。已经提出了用于学习基因网络的各种模型,其中贝叶斯网络(BN)证明了该任务的前景。但是,BN结构学习是一个NP难题,并且精确方法和启发式方法都需要大量计算,且生成大型网络的能力有限。为了解决这些问题,我们开发了一组并行算法。首先,我们提出了一种用于精确BN结构学习的通信有效并行算法,该算法是工作最优的,条件是2 ^ n p.log(p),其中n是变量的总数,p是处理器的数量。该算法的空间复杂度在最佳值的1.41以内。我们的经验结果表明,在多达2,048个处理器上几乎可以完美缩放。我们将这项工作进一步扩展到有界节点度数的情况,其中对每个变量的父代数施加限制d。我们将算法的运行时行为描述为d的函数,建立了d的值范围[n / 3-log(mn),ceil(n / 2)),在此范围内它会影响性能。因此,确定了两个高原区域:对于d \ u3c n / 3-log(mn),运行时复杂度与对于d = 1相同,对于d \ u3e = ceil(n / 2),其中运行时复杂度与d = n-1相同。最后,我们提出了一种用于大规模BN学习的并行启发式方法。这种方法旨在将精确学习的精度与启发式方法的可扩展性相结合。我们的经验结果证明了在各种高性能平台上的良好扩展能力。使用合成生成的表达数据评估精确和启发式方法的学习网络质量。通过将其应用到拟南芥中的类胡萝卜素生物合成途径,可以评估通过精确算法学习的网络的生物学相关性。

著录项

  • 作者

    Nikolova, Olga;

  • 作者单位
  • 年度 2012
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号